Picture for Yuhao Dong

Yuhao Dong

From Pixels to Words -- Towards Native One-Vision Models at Scale

Add code
May 27, 2026
Viaarxiv icon

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

Add code
May 25, 2026
Viaarxiv icon

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Add code
May 18, 2026
Viaarxiv icon

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Add code
Apr 06, 2026
Viaarxiv icon

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Add code
Apr 06, 2026
Viaarxiv icon

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Add code
Mar 27, 2026
Viaarxiv icon

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Add code
Mar 18, 2026
Viaarxiv icon

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Add code
Mar 16, 2026
Viaarxiv icon

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Add code
Feb 09, 2026
Viaarxiv icon

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon